Conversation
benches/indexing_benches.rs
Outdated
| let _ = writer.add_document(doc).unwrap(); | ||
| }); | ||
|
|
||
| let _ = writer.commit(); |
There was a problem hiding this comment.
I think this is a tough thing to benchmark. The I/O really happens in the commit, where any remaining buffered docs are written to disk. I wouldn't try benchmarking I/O, I would stick to, say, searching through docs that are buffered in memory.
Imagine your service going this way:
- Fill the index with juicy molecules
- Cut down size of index based on atom count, etc
- Load the top 1000 matching docs in to cheminee's memory
- Rank those 1000 matching docs using search routine $XYZ
- Return the top N hits
I think you want to create the output vec from step 2 in static memory, somehow, and then you just want to benchmark the serach routine of step 3 and assert the stable sort going in to step 4.
Fake 1/2, benchmark 3, assert 4 is what you expect. If you try to benchmark the actual I/O to get through 1/2 you will find it's highly variable and does not give a reliable picture.
benches/search_benches.rs
Outdated
| use std::collections::{HashMap, HashSet}; | ||
| use std::ops::Deref; | ||
| use tantivy::schema::Field; | ||
| use test::Bencher; |
There was a problem hiding this comment.
This include block is getting unwieldly, I'll do a future PR to reformat our project. Just a note to myself here...
benches/search_benches.rs
Outdated
| let searcher = reader.searcher(); | ||
| let results = basic_search(&searcher, &query, 100).unwrap(); | ||
| let _final_results = aggregate_query_hits(searcher, results, &query).unwrap(); | ||
| }); |
There was a problem hiding this comment.
Again, I think benchmarking I/O is going to provide an inaccurate picture
…ng, and searching
|
How about we just bench the core functionality? This has already been illuminating for determining the slowest bits of functionality. Standardization of molecules looks to be the slowest step by far, which I guess makes sense. |
Description
Resolves #108 by adding benchmarks for indexing and searching.
We now have added benchmarks for most of the core functionality used for indexing and searching:
These benchmarks make it clear, for example, that our molecular standardization step is comparatively more computationally intensive than most of the other functionality.